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As the era of high precision cosmology approaches, the empirically determined power spectrum 
of the microwave background anisotropy Ci will provide a crucial test for cosmological theories. We 
present an exact semi-analytic framework for the study of the ampling statistics of the C; resulting 
from observations with partial sky coverage and anisotropic noise distributions. This includes space- 
borne, air-borne and ground-based experiments. We apply this theory to demonstrate its power 
for constructing fast but unbiased approximate methods for the joint estimation of cosmological 
' parameters. Further applications, such as a test for possible non-Gaussianity of the underlying 

theory and a "poor man's power spectrum estimator" are suggested. An appendix derives recursion 
relations for the efficient computation of the couplings between spherical harmonics on the cut sky. 



o 
o 
o 



< 



oo 
o 



Of 
i 

o 

H 



I. INTRODUCTION 



During the next few years ground based observations and balloon missions [1] as well as satellite observation [2,3] 
promise exquisite determinations of the angular distribution of cosmic microwave background (CMB) anisotropics. 

Inflationary cosmogonies, the most popular theories of structure formation, predict a statistically isotropic CMB 
with Gaussian fluctuations. To linear order in perturbation theory this means that within this class of models all 
cosmologically relevant information contained in the map of anisotropics is distilled in the angular power spectrum 
( '/• 

An accurate estimate of these quantities therefore promises to have tremendous impact on our knowledge of the 
global properties of the universe and the physical processes which lead to the formation of structure [4-7]. This 
knowledge is quantified in terms of cosmological parameters such as fltot, ^6, Hq, etc. 

The Ci are uniquely defined as rotationally invariant quadratic combinations of the coefficients of an expansion of 
the anisotropy in spherical harmonics. Their useful statistical properties rest on the the fact that the spherical 
• • , harmonics are a complete and orthogonal basis set on the sphere. However, in realistic experiments, only part of the 
celestial sphere is covered, if only because our CMB sky is obscured by the Milky Way. Also, due the particulars 
of the observational strategy, non-uniform and possibly correlated noise contaminates the observed signal. In order 
' to interpret past observations and forecast the impact of future observations we need a framework for studying the 
5^ i sampling statistics of the Ci for incomplete sky coverage in the presence of non-uniform noise. 

It is worth noting that there are several different kinds of Ci which we will be referring to in this paper. 
The stochastic nature of the observed C; is illustrated in Figure 1, where the straightforward computation of C'i 
from a subset of our Monte Carlo simulations of incomplete and noisy skies are overlaid on the theoretical predictions. 
One or more of the following simplifying assumptions can be invoked in cases where one would like to avoid the full 
complexity of the problem (e.g. [4,5,8,9]): 

• (Independence) The observed C; are approximately independent due to nearly full and uniform sky coverage, 

• (Scaled x 2 ) Their sampling distributions do not change appreciably from the \ 2 distributions which apply for 
the full sky, apart from a rescaling of the number of degrees of freedom by the sky fraction /, and 

• (Gaussianity) Their sampling distributions are well-approximated by Gaussian distributions which have the 
same first and second moments as these rescaled x 2 distributions. 

There are several reasons to carefully assess these approximations and, if necessary, go beyond them. One reason is 
that in the short term balloon and ground based experiments will provide the leading edge science results to the field. 
Due to practical limitations we cannot hope to even come close to full sky coverage with these types of experiments. 
The observation regions are often ring-shaped or cover a circular region of the sky which subtends a small solid angle. 
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FIG. 1. The dots are 20 of our 3328 Monte Carlo simulations. On these we superpose the Monte Carlo average (yellow) and 
the theoretical Ci (black dashed), which are the sum of signal (blue, solid) and noise (blue, dashed). The discrepancy seen at 
low I, and to a lesser degree in the inset, demonstrates the effect of correlations due to partial sky coverage even for a satellite 
experiment capturing 70% of the CMB sky. 



It is an urgent matter to assess statistically how these imminent experiments can constrain the power spectrum and 
cosmological parameters. 

Further, the planned satellite missions are aiming to determine the Ci to sub-percentage accuracy. In the case 
of the Planck Surveyor mission, this has given us the hope of detecting small effects such as secondary anisotropies 
which are due to nonlinear gravitational effects on the CMB photons during the free-streaming epoch. An example 
of such an effect is anisotropies due to gravitational lcnsing. This is an important issue because a detection would 
break otherwise present parameter degeneracies and allow a consistent parameter estimation from CMB data alone 
[10]. 

More generally, the impact CMB observations will have on cosmology makes it important to use approximations in a 
controlled way. For example, in the analysis of COBE-DMR data it was realized that using Gaussian approximations 
for quadratic quantities introduced systematic biases [II]. At the same time, we need approximations to make feasible 
the analysis of the huge CMB data sets we expect in the coming years. 

The Maximum Likelihood Inversion (MLI) method (cf. [12-15] for applications in a cosmological context) is the 
standard framework for solving this type of problem. To illustrate, we write down the likelihood functional for a 
Gaussian sky. In order to do so we need to define the vector of observed pixel temperatures on the sky d with N p i X 
components. A given theory and observational strategy on the sky imply the covariance structure of the elements di 
in terms of signal and noise covariance matrices S and N such that 

(d i d j )=S ij + N ij . (1) 

The signal covariance matrix S is simply a discretised version of the theoretical angular correlation function 

1=2 

evaluated at the pixel locations. The noise covariance N reflects those imperfections of the experiment and observa- 
tional strategy which lead to spatial correlations in the projected noise. 
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FIG. 2. Annuli, circular patches such as polar caps and even disjoint geometries such as the full sky with an annulus (such 
as a Galactic cut) removed are within the class for which our formalism is exact. 

This leads to an expression for the likelihood 



The matrix S is a function of predicted Cj, which are in turn determined by the parameters of some theory. The 
matrix N includes a model of the expected noise covariance given the experimental setup. The structures of both S 
and N also depend on the observed region on the sky. 

For a given set of data the likelihood is then considered as a function of the parameters of a given theory. The 
maximum likelihood parameter estimates are determined as those which maximise Eq. (3). While these manipulations 
are conceptually simple, the practical difficulties are apparent. The number of operations needed to evaluate Eq. (3) 
(or equivalcntly an estimator derived from it) scales as the most costly operations contained in it. These are the 
determinant evaluation in the denominator and the matrix inverse in the exponent. For a general situation both of 
these require N^ ix operations. 

In the near future N p i X will be 10 5 from balloon experiments (such as Maxima), then 10 6 for MAP and in less than 
a decade 10 7 -10 8 for Planck. These numbers imply 10 15 -10 24 computations. Even when considering future growth in 
computing performance, estimates of the computational time required to evaluate Eq. (3) a single time reach up to 
many years [16]. To compute the maximum likelihood estimator requires many likelihood evaluations and would thus 
require an unfeasibly long period of time. For practical purposes, straightforward applications of the MLI method 
therefore fail to give answers in finite computational time. 

Previous studies of this subject have usually directly addressed the problem of "solving for the C/". Tcgmark [17] 
rejects the maximum likelihood estimator on the grounds that it is too time-consuming to compute and suggests 
a minimum variance (and hence "optimal") method based on a quadratic estimator which makes a form of the 
Gaussianity assumption above. While this method is powerful in the regime where it is applicable, it does not provide 
a means of assessing the validity of this assumption. Within our analysis we can make the connection between the 
minimum variance and the maximum likelihood methods (cf. subsection III A). 

The approach of Bond, Jaffe and Knox [18] is again very different from ours. It consists in finding an approximate 
form for the likelihood and parametrising the shape of the power spectrum in terms of a set of top hat functions, 
whose height gives the average value of the Ci over its width. The width of these top hats is chosen wide enough that 
correlations between them become small. Their method is of practical significance because their approximate ansatz 
for the likelihood is easy to evaluate. 

Our perspective is the following: we endeavour to provide an ab initio formalism which lays bare the statistical 
properties of the power spectrum coefficients. It then turns out that the insight gained from this study can be used 
to formulate unbiased and well-controlled methods for parameter estimation. 

To provide a quantitative basis for this discussion we present in this paper an efficient semi analytic calculational 
framework within which the statistical properties of the C; power spectrum coefficients can be studied for theories 
which generate a Gaussian CMB sky. This framework is exact for the important class of survey geometries and noise 



L(c?|thcory) = 




(3) 



v/(27r) N p- |S + N| 
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patterns which obey rotational symmetry about one arbitrary axis and is not limited to large sky coverage. This class 
contains both skies with symmetric and asymmetric cuts north and south of the Galactic equator, rings, annuli, and 
polar cap shaped regions of any size as indicated in Figure 2. For this class we obtain exact analytical solutions for 
the mean (C;), the variance ^(AC;) 2 ^>, the skcwness, kurtosis and all higher moments of the sampling distribution for 

all I. We derive exact and conveniently computable analytic expressions for the marginalised probability distributions 
of the cut sky Ci for all I as well as integral representations for the joint distribution of all C\. As an added bonus 
we find that our methods also allow dealing with arbitrary noise patterns to very good accuracy and conclude by 
illustrating their use in some applications. 

To test our results we perform state-of-the-art Monte Carlo simulations for a high resolution CMB satellite with a 
Galactic cut of size ±20° (resulting in a sky coverage comparable to COBE). We use a sine modulated noise template 
as a simple model of the noise pattern which would result from scanning along great circles through the ecliptic poles. 
The numerical results we obtain provide a solid check on our analytic calculations. 

The plan of this paper is as follows: in section II we lay out the general framework. Section III develops the 
statistical theory of power spectrum coefficients for observation regions which are azimuthally symmetric about some 
axis. We arrive at expressions for the cut-sky moments of any order and solve the moment problem to obtain the 
exact sampling distributions. We compare our analytical results to Monte-Carlo simulations in section IV. Several 
applications of our methods are presented in section V. Section VI contains further discussion and our conclusions. 
Appendix A discusses the effects of non-cosmological foregrounds on the results presented in this paper. Details of 
our analytical and numerical techniques are given in three further appendices. 



II. THE FRAMEWORK 



The full sky of CMB temperature fluctuations can be expanded in spherical harmonics, Yj m , as 

l m ax 

1=0 m 

where 7 denotes a unit vector pointing at polar angle 9 and azimuth 4>. Here we have assumed that there is insignificant 
signal power in modes with / > lmax ctnd introduce the convenient notation that all sums over vci run from —Imax 

to 

Imax but all quantities with index Im vanish for m > I. 

In a universe which is globally isotropic and contains Gaussian fluctuations, the a; TO are independent, Gaussian 
distributed with zero mean and specified variance cj eory = (|a; m | 2 ). Hence, for noiseless, full sky measurements, 
each measured C; independently follows a ^-distribution with 21 + 1 degrees of freedom and mean 

Qtheory^ Owing to 

Galactic foregrounds, limited surveying time or other constraints inherent in the experimental setup, the temperature 
map that comes out of an actual measurement will be incomplete. In addition, a given scanning strategy will produce 
a noise template: high noise per pixel results in regions where the scanning lines are less densely packed compared 
to regions where they are denser. We model the noise as a Gaussian field with zero mean which is independent from 
pixel to pixel and modulated by a spatially varying root mean squared amplitude Wat (7) * IF (7) (the factor of W 
appearing because the noise too is zero in unobserved regions). Therefore the observed temperature anisotropy map 
is in fact 

f (7) = VF( 7 ) [T( 7 ) + ^(7)^(7)] (5) 

where W is unity in the observed region and zero elsewhere. 

The coefficients in the spherical harmonic expansion recovered from such an incomplete observing region are 

~a Vm > = / «iny; m , ( 7 )T( 7 ) 

Jo 

= 5> Im f dr>y; m ,(7)y] m ( 7 ) (6) 

The notation " J" " denotes integration over the observed region. Note that the usual orthogonality property of the 
Ylm{l) does not hold any longer because we do not integrate over all solid angle. This becomes clearer if we define 
the geometric coupling matrix 



W V . 



ee f ^y; ro ,(7)F iro ( 7 ) (?) 

Jo 
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and write 



ai'm' = Wi'm' ImCLlm- (8) 
Im 

By definition, the Wy m i \ m are just the matrix elements of the W{^) window in a spherical harmonic basis. These 
matrices have been previously discussed in a similar context in the pioneering papers by Peebles and Hauser [19,20]. 

This means that expanding T as in Eq. (6) produces a set of correlated Gaussian variates a; m for the signal and 
a-N im for the noise. These combine into power spectrum coefficients 

Ci = ^^-j^2\ai, n + a N i m \ 2 (9) 

m 

whose statistical properties differ from the ones of the C;. We therefore refer to these quantities as pseudo-Ci. In 
what follows we will discuss the statistical properties of these quantities. 



III. EXACT PSEUDO-Ci STATISTICS 

Let us focus on the terms of the sum Eq. (9). They are squares of Gaussian distributed variates with zero mean. 
It follows that the Ci are sums of 21 + 1 x 2 variates with one degree of freedom. However, each term in the sum has 
a different expectation value and may in general be correlated, so the C\ are not x 2 distributed with 21+1 degrees of 
freedom. 

If we assume white noise, = C N , then each term in the sum has expectation value 

o lm = 2lTl (10) 

A few remarks are in order: 

1. Azimuthal symmetry has not been invoked to derive this relation. 

2. The cross-term which comes from expanding the square in Eq. (9) has vanishing expectation value because 
signal and noise are assumed to be uncorrelated. 

3. The sum over Ci includes the non-cosmological I = and I = 1 terms. The appearance of these terms merits 
some further discussion, which we relegate to Appendix A. 



A. Digression: connecting to the least squares estimator 

Summing Eq. (10) over m gives a set of linear equations which also results in a different context, namely after 
minimising the variance of the Ci estimator in Tegmark [17]. There it appears as a set of linear equations for the 
estimator, which the author assumes to be Gaussian distributed. 

We see from our treatment that this equation is only valid after the expectation values are taken. The quantities 
involved are quadratic in Gaussian variates which leads to the presence of a cross term which only disappears after 
taking the expectation value. 

This is the reason why some C; estimates in [17] turn out negative for values of I where the signal to noise ratio 
becomes of order unity. This is of course unphysical because the C; are properly quadratic and hence positive 
semi definite quantities. We observe that it is precisely neglecting this fact which leads to unphysical results. 

On the other hand, our treatment provides alternative proof of the fact that the least-squares estimator is unbiased. 
It also explains the significance of the Fisher matrix which appears in this context: is simply a measure of the geometric 
couplings between eigenmodes of total angular momentum. 
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B. Azimuthal symmetry 



To elucidate the correlation structure between the a; m , Eq. (8), we write Eq. (9) as a quadratic form 

<?< = 2zTT E ^ (ii) 

m l',m',l",m" 

in independent normal variates £; TO with zero mean and unit variance. The object M. describes the combined effect 
of statistical correlation and the geometrical couplings between the various expansion coefficients. 
For an isotropic Gaussian field on the full sky 

M[l\ i 2m2 = S Ul 5 mmi (cfT + Cl imse ) Su 2 5 mm2 . (isotropic, full sky) (12) 

We show in Appendix B that for a Gaussian field with an anisotropic but azimuthally symmetric weight function 
W{6), M is of the form 

M?Z\ i 2 m 2 = SmmML (pf^ + ^T") <i 2 <W> (anisotropic, cut sky) (13) 

where 

Wjtl = W Vm im = J dfj,W(fj,)Xi m {n)Xi> m {n), (14) 

and we assume identical windows for signal and noise for notational simplicity in Eq. (13). The generalisation for 
arbitrary azimuthally symmetric noise windows is straightforward. 

The structure exhibited in Eqs. (13) and (14) is the key fact which allows the derivation and cheap evaluation of 
the exact results we obtain. An efficient algorithm for the evaluation of the matrix elements of W is given in appendix 
C. For a maximum I of 1024 this takes just over one minute on a single R10000 CPU. 

It follows that while the 5; TO + ajy i m terms in Eq. (9) are correlated for different I, they will be independent 
for different m. As a consequence, under the assumption of azimuthal symmetry the are sums of independent 
Chi-squared (x 2 ) variates, each with one degree of freedom but different expectations. Below we succeed in finding 
analytical expressions for the probability density functions of such generalised \ 2 variates and we hence solve for the 
statistical properties of the pseudo-C; exactly. 



C. Pseudo— Ci moments 

The problem is to compute the statistical properties of a sum of independent variates, each of which has a known 
distribution. A useful tool for attacking this kind of problem is the method of characteristic functions [21]. Appendix 
D provides a heuristic introduction to this method. 

The characteristic function of a x 2 variate with one degree of freedom is given by 

(i-^fj*' (15) 

We can multiply them together for each term in the sum Eq. (9) and inverse Fourier transform to obtain p(Ci) as 

2 *J-°° nL- f (i-WJ' 

The log of the integrand in this expression (not including the complex exponential) is the cumulant generating function, 
which we can differentiate to obtain all cumulants of the pseudo-C; as 

K „ = 2"- 1 (n-l)!^(aD". (17) 

m 

We give the following expressions for the mean, variance, skewness Q\ = ^ Ac ^ \ and kurtosis B 2 = as 

((ACi) 2 >2 ((AC() 2 ) J 

examples: 

(Ci) = E m ^ ((AQ) 2 )=2E ro af m 

(18) 



MO) =^J_ ( < b ^, '. — - t (. '<>) 



p 1 = 2 i Z m ° lm ^ = 12^ 
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D. Pseudo— C'i distributions 



It turns out that we do not have to contend with just the cumulants. We will now derive the analytical form for 
the pseudo-Cj distributions. We define Sm — (2/of m ) and note that sS = s^} m . Then all terms with equal \m\ > 1 
pair up and we obtain 

A® r+°° e -ic t x 

= sr L V-^nueffi-*) (19) 

where A® = \J IlL=i s ™- We expand in partial fractions and integrate each term analytically (using a standard 
integral from [22]) to produce a closed form solution in terms of incomplete gamma functions -f(a,x): 

p(Q) = A' {1) V , V 1 V >- (20) 

where the primed product symbol J| only multiplies factors which have m^m! and the normalization constant is 



(0 

A , W ^ V -» ^*m=l _ (21) 



4/(0 _ V llm=l * 

~2T(!) 



We have, therefore, reduced the problem of determining the sample probability density of the pseudo-C; to com- 
puting the af m . The computation of these coefficients has to be done only once for a given survey geometry and 
model theory. In fact it is clear from their definition and Eq. (10) that once the w//^ are computed for a given survey 
geometry, the af m can be generated very easily given a model C; spectrum. 

E. Correlations between different Ci 

The distributions we have just derived are 1-point or marginal distributions. They contain complete statistical 
information about each C\ taken for itself, but they only account for the correlations between C; implicitly, giving an 
effective distribution for each C; which contains these correlations. To obtain explicit information about the full joint 
distribution of all C; we must go a step further. 

While we wish to relegate further study of the properties of the joint distribution to future work we will now mention 
some preliminary results. 

The characteristic function of the joint distribution of quadratic forms such as Eq. (II) has been studied in the 
literature [21]. Fourier transforming we can obtain an integral representation of the joint distribution (analogously to 
Eq. (16) for the 1-point distributions) 

r-n l f ni(e- iS,e, dS,) 

I \) (27r) , -»« J \t-2iJ2 l ,S l ,M l '\* 

Even though we cannot proceed further in the Fourier inversion or must take recourse to approximate methods, we 
can still gain useful information about the joint statistics of the set of Cj. Similarly to subsection IIIC we can derive 
the cumulants of the variates from the logarithm of the integrand (disregarding the complex exponential). It is easy 
to show that for the covariance between the pseudo-C; we can write 

(ACtACi^ =tvM {l) M {V) . (23) 

It turns out that higher order joint moments are computed similarly in terms of traces of products of M. W of various 
I. Note that these expressions are valid in general, but the special form of in the case of azimuthal symmetry 
will reduce the scaling of the computational cost for their evaluation. 
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F. Approximations for the non— symmetric case 

Returning to the case which we have fully solved, let us perform a reality check on the assumptions we have made. 
There are some important situations where the noise pattern does not follow the azimuthal symmetry of the survey 
geometry. In the case of the Planck satellite the scanning strategy is approximately centered on the ecliptic poles, 
while the Galactic cut is tilted through w 60° with respect to this. In this case Eq. (20) becomes an approximation. 
We found it to be very accurate indeed to continue using these distributions with the af m computed for an asymmetric 
Wn, even for a strongly asymmetric noise pattern. This approximation will be worst in the least interesting, noise 
dominated regime at very high I. Note that the af m and hence the (Ci) remain exact (because the Xi m are independent 
of the azimuth) but the remarks leading to Eq. (20) are no longer exactly true. For applications the final justification 
comes from the excellent agreement we find when we check against our Monte Carlo simulations which is illustrated 
in Figure 3. In fact all the applications we present in the following were computed using strongly asymmetric noise 
patterns. 

IV. MONTE-CARLO SIMULATIONS 

To test our formalism we performed 3328 Monte Carlo (MC) simulations for a high resolution CMB satellite, such 
as MAP or Planck (resulting in a sky coverage comparable to COBE). We simulated realizations of the CMB sky in 
the standard cold dark matter model(il m = 1, fi&Zi 2 = 0.015, Hq — 70km/s/Mpc). From these maps we carved out a 
±20° Galactic cut and contaminated the remaining area with spatially modulated Gaussian white noise of maximum 
root mean squared temperature 124/iK per pixel of characteristic size 3.4 arcminutcs. We use a tilted noise template 
Wn — V sm @e, where 9e is the ecliptic latitude, as a simple model of the noise pattern which would result from 
scanning along meridians through the ecliptic poles. We then Fourier analysed these maps and stored the resulting 
Ci- 

To give a visual impression of the resulting probability densities we show four cases in Figure 3. These cases were 
chosen to represent different regimes. The first case, I — 2 is especially sensitive to effects due to the cut sky and an 
important region for the overall normalisation of the power spectrum. Two cases for intermediate I (I — 200, I = 310) 
probe cosmologically interesting scales (on top of the first acoustic peak and in the first trough of the standard cold 
dark matter spectrum we use). At high I, I = 1000, there is a significant noise contribution. 

Next to our analytical results we show the histogrammed results from our MC simulations. A Kolmogorov-Smirnov 
test failed to detect deviations between the distributions of this MC population and Eq. (20) at 99% confidence, which 
validates our semi-analytic expressions. 

Also shown in Figure 3 are the \ 2 distributions which the C; would follow in the full sky case as well as the 
commonly used Gaussian approximation [5]. These are mean adjusted to account for the lost solid angle due to the 
Galactic cut. At I < 30 the difference is striking. For higher I the Gaussian approximation becomes better as higher 
moments die away by dint of the Central Limit Theorem, but there remain visible systematic differences to the true 
distributions. In particular, there is a residual shift in the mean and the approximations tend to be slightly narrower 
than the histograms for very high I w 1000. 

This becomes a more quantitative observation when looking at the percentage discrepancies between the mean and 
variances as a function of I in Figure 4. The discrepancy is of the order of 1 % in the mean and 5% in the standard 
deviation on most scales, except for I < 70 where the effect is larger. These discrepancies are important at the level 
of precision of future almost full sky missions. For medium and small sky coverage the mode couplings are stronger 
and we expect this to have an even larger effect on the probability distributions. We also compare the skewness f3\ 
and kurtosis /?2 of the \ 2 distributions to our distributions. The percentage difference is larger than for the first two 
moments but arguably less important at large I, since (5\ and fli decay as (21 + 1)~5 and (21 + 1) _1 , respectively. 

V. APPLICATIONS 

A. One— dimensional parameter estimation 

As a first application we study the effect of approximating the likelihood for parameter estimation. Since our 
distributions have the correct means, we simply multiply them together for a simple, unbiased approximation to the 
likelihood 

£=n^) ( 24 ) 

I 
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FIG. 3. The pseudo-C; sampling pdfs , Eq. 20 (shaded), the \ 2 (solid lines) and the Gaussian (dashed) approximations 
compared to Monte Carlo simulations (histograms) for 1=2,200,310,1000. The theoretical Ci were computed for the standard 
cold dark matter model(f2 m = 1, Q,bh 2 = 0.015, Ho = 70km/s/Mpc). Note that the Monte Carlo simulations were run with 
noise pattern which strongly broke the azimuthal symmetry assumption. We nevertheless find superb agreement with our 
analytical predictions. The C\ are in units of fj,K. 



This is a conservative approximation in the sense that we will not overestimate the estimation accuracy since the 
marginal distributions have all correlations integrated out and we will therefore overestimate the error bars on the Ci . 
Using this likelihood, as well as the Gaussian and \ 2 approximations, we attempt to estimate the baryon parameter 
fib (holding all other parameters constant) from several randomly selected realizations in our MC pool. The results 
are shown in Figure 5. As expected, Gauss and x 2 consistently find estimates which are biased about 1.6% high, 3 
standard errors of the mean away from the true value, while our likelihood gives a perfect fit. 

Since the moment discrepancies depend on the underlying cosmological theory, this level of bias will also depend on 
the true theory. Therefore, this number should only be taken as indicative of the general level of the error introduced 
by using the naive Gaussian or x 2 approximations. 



B. Multi— parameter estimation 

We demonstrated in the previous subsection that we can use our formalism to construct approximate forms for the 
likelihood which lead to an unbiased parameter estimate. The idea proposed in this subsection is motivated by the 
fact that higher moments of the Ci distributions die away quickly with I (ci. Eq. (18)) especially for observation 
geometries with large sky coverage. In particular we found in the case we studied that the distributions were visually 
indistinguishable from Gaussians for I > 100. For parameters which are sensitive to high / modes, we therefore further 
approximate Eq. (24) by replacing the pseudo-C; pdfs with Gaussians with the same mean and the same variance, 



c(ci) = n 



2>100 



(<*-<«»' 



AC? 



(25) 
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10 100 1000 

I 

FIG. 4. Percentage discrepancy of the first four moments of the pseudo-C; compared to the \ 2 approximation as a function 
of angular wave number I. The solid lines are, top to bottom, the percentage discrepancy between the mean, standard 
deviation, skewness, and kurtosis of the x approximation and the pseudo-C; distributions. The dashed line corresponds to 
0% discrepancy. The stars are the (C;) computed from 3328 Monte Carlo runs, showing excellent agreement only limited by 
Monte Carlo noise to better than 0.1 % for all I. 

In this approximation, maximum likelihood estimation has reduced to simple x 2 fitting, however with the correct 
means and variances which implicitly account for inter-C; correlations. 

3 

The operation count for each likelihood evaluation has now dropped to N^ ix . If the computation of the first and 
second pseudo-C; moments is organised in a memory-efficient manner the elements of the coupling matrix need to 
be computed only once. We find that we can evaluate Eq. (25) at a rate of several hundred times per hour. This 
surprisingly slow scaling of the computational cost with number of theories is presumably due to more efficient use of 
on-processor caching. 

To illustrate, we solve the problem of estimating 3 parameters (O c fif, and Hq) simultaneously from a sky with 
12 x 10 6 pixels. To compare with the naive Gaussian approach and to show that our method is unbiased, we compute 
maximum (approximate) likelihood estimates from 100 realisations of the sky and plot a representation of the the 
empirical distribution of parameter estimates in three dimensions in Figures 6 (naive x 2 ) an d 7 (our approach). 

We find again that our approach is unbiased. The distributions of the estimates are clearly centered on the true 
values. 

We stress that our approach avoids the usual difficulties of the Gaussian approximation as discussed in section III A. 
For example, even though we use the Gaussian approximation, which of course does not exclude negative C;, they are 
assigned an exceedingly small probability. This is because no attempt is made to subtract out the noise contribution 
from the pseudo-C; — instead it is modeled consistently and the (signal x noise) cross term which is present in each 
realisation is not allowed to dominate. 



C. Test for non— Gaussianity 

We now mention an application which reverses our way of thinking about the results in this paper and takes 
advantage of the exact analytical expressions we compute for the pseudo-C; distributions. Imagine that we have 
gained knowledge that a certain theory is correct, for example through measurements of the large scale structure of 
the galaxy distribution and other probes of cosmology such as supernovae etc. Then we can use the formalism shown 
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FIG. 5. One parameter likelihood estimates of Sib- Shown are the means (solid) within ±3 standard errors in the mean 
(dashed) of estimations from 17 realizations (diamonds) of the sky. Both the x 2 (left) and the Gaussian (right) approximations 
produce a bias, while our approximation (center) has no detectable bias. 



here as a test of Gaussianity, simply by computing the pseudo-C/ and then substituting them into Eq. (20). An 
interesting feature is that this probes Gaussianity scale by scale which might make it feasible to disentangle confusion 
effects if the non-Gaussian signal was only apparent on a small range of scales and masked by a Gaussian component 
on other scales. 



D. De— biasing 

In the introduction we mentioned the use of iterative methods to attack the difficult problem of power spectrum 
estimation. A good starting guess can drastically speed up the convergence of such methods. In the case of large sky 
coverage, where the biases which occur in the C; are small, we suggest the following recipe as a "poor man's power 

spectrum estimator" for de-biasing the power spectrum in only O (N^^j computational time: 

1. compute the G from the data, 

2. fit a smooth curve Cf mooth through them, 

3. compute Cf = (C; — C™ mse ) and set all negative Cf to zero, 

4. use the Cf instead of cf 1 ^" in Eq. (18) to compute (C/). 
Then 

G=G + (cr°° th -(G)) (26) 

is an estimator of the underlying theory C;. There is a tiny residual bias which is second order in the discrepancy 
shown in Figure 4. This is utterly negligible for upcoming satellite missions. 



VI. FURTHER DISCUSSION AND CONCLUSIONS 



We have computed the sampling properties of a set of quantities which we call pseudo-C/ . We have shown the use- 
fulness of these quantities for the easy compression of large Cosmic Microwave Background data sets as well as for the 
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forecasting of the discriminatory power of planned experiments. For example, in situation where percentage accuracy 
is required, the pseudo-C/ sample variances shown in Figure 4 could immediately replace the usual approximation 
which simply rescales the cosmic variances for the full sky by the sky fraction. 

On the other hand this exact framework can be used to justify or derive more approximate methods. We have given 
an example for the construction of an approximate but unbiased likelihood for cosmological parameters on the basis 
of a Gaussian approximation to the marginalised pseudo-C; probability distributions. Using this approximation we 
can evaluate the likelihood for hundreds of theories per hour on a single CPU and find unbiased estimates. 

If only a rough treatment of the C; statistics is sufficient, for example in the case of large sky coverage, the results 
derived here can be used as a justification for other simplifying assumptions which are already in use in the literature. 

Apart from the obvious applications to experiments with small and medium sky coverage such as balloons or 
ground-based missions, many further uses of this framework are conceivable. For example, one could use the sta- 
tistical framework to design 'optimal' scanning strategies, encoded in Wn, and assess more realistically if secondary 
anisotropics will be detectable with future CMB missions. Finally, all ingredients are there to refine the approximation 
to the joint likelihood we used in this paper by taking into account the covariances Eq. (23), for example by using a 
multivariate Edgeworth expansion around the peak of the likelihood. 

We note that numerical techniques have recently been developed [23] which allow the computational solution of 
the power spectrum estimation for high I applications under similar assumptions to the ones which lead to the 
analytical results derived in this paper. Their methods are efficient for the case of almost full sky observations. A 
purely numerical approach still requires significant computational resources, especially if there is signal in modes with 
I > 1000. The results presented in this paper should be seen as complementary to such calculations. An analytical 
framework admits a more fundamental approach to understanding and is a useful yardstick against which numerical 
work can be tested or from which approximate methods can be derived as we have demonstrated in the applications 
presented in section V. 

Our methods do not allow for significantly correlated noise. Due to the current state of detector technology, all 
CMB missions suffer from correlated noise, with the possible exception of the MAP satellite [24]. Apart from brute 
force numerical calculation with O (Np ix ) operations, which is no longer feasible even with current experiments, there 
are currently no techniques for computing sampling statistics in the presence of correlated noise. This clearly needs 
to be addressed as an extremely urgent and important issue. 

To summarize, we have presented a theoretical framework for the study of power spectrum statistics which is 
applicable regardless of the size of the sky area covered. We go beyond current approximations and present a semi- 
analytic formalism for the computation of sampling distributions of the C; for any Gaussian cosmological model and a 
large and important class of surveying strategies. We show that these results can be usefully applied to the estimation 
of cosmological parameters from simulated data. A range of further applications is suggested which demonstrate the 
power of our formalism. 
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FIG. 6. Multi-parameter estimation of fio, ^f> and Ho using naive x 2 fitting. The three panels show three views from different 
directions of the empirical distribution of the parameter estimates. Each sphere represents one bin of the three-dimensional 
distribution. The size and colour of a sphere indicates the number of realisations (out of 100 total) which led to parameter 
estimates within its bin. The true parameter values are at the origin of the coordinate axes. It is clearly visible that the 
distribution is shifted with respect to the true distribution by an amount which is inconsistent with the width of the distribution. 
In other words the true values could be ruled out at high significance if this estimate was used. 
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FIG. 7. Multi-parameter estimation of Qo, Qt, and Ho using our approximate likelihood Eq. (25). The three panels show 
three views from different directions of the empirical distribution of the parameter estimates. Each sphere represents one bin 
of the three-dimensional distribution. The size and colour of a sphere indicates the number of realisations (out of 100 total) 
which led to parameter estimates within its bin. The true parameter values are at the origin of the coordinate axes. It is clearly 
visible that the distribution is correctly centered on the true values. 
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APPENDIX A: LOW ORDER NON-COSMOLOGICAL MODES 



We assume in this paper that we have access to a partial sky map of CMB anisotropies free from non-cosmological 
signal. Even in the event that foreground contributions can be filtered out using the frequency information provided 
by multi-channel experiments, this assumption merits further discussion — particularly for large amplitude low order 
modes such as the monopole and the dipole which are thought to consist exclusively of non-cosmological signal. 

An ideal full sky map of the cosmic microwave background anisotropies has zero mean, by definition. Also, in 
Fricdmann-Robertson-Walker models, there is no cosmological dipole. But we expect realistic maps resulting from 
observations of the microwave sky to contain both monopole and dipole components; Galactic emission is positive 
definite and will therefore give a monopole contribution. The motion of our solar system with respect to the frame in 
which the CMB is at rest produces a dipole. The question arises how to deal with the presence of such non-cosmological 
low order modes. 1 

If the experiment measures total power, we may hope to remove a sizeable part of the monopole using the frequency 
information of a multi-channel experiment. In the case of a differential experiment such as MAP which is blind to the 
monopole component, the problem depends on the particulars of the map-making algorithm employed. Frequency 
information alone cannot distinguish between a non-cosmological and a spurious dipole since both are due to the 
microwave background. Studies of the peculiar velocity field may provide some information. 

We see the three following different approaches which can be used to deal with the problem of mode to mode 
coupling when non-cosmological components are removed from the sky. 

One is orthogonalisation [13]. This solves the problem, but is not feasible computationally when dealing with 
millions of degrees of freedom. 

The other is projecting out the undesired modes either by fitting and subtracting or more formally by operating 
with projection operators. Those approaches lead to a correlation structure for the remaining modes which is not 
analytically tractable in the manner presented in this paper. 

The third is allowing for the presence of a monopole and a dipole in a Bayesian fashion, by including a rough 
estimate of their size in the theory C;. In particular, a rough a priori measure of the monopole and dipole present in 
the data can be input into Co and C\ . Our statistical framework allows this and yields the correct predicted sampling 
statistics for all C taking into account all the l—V couplings due to the cut. In effect, Co and C\ are treated as 
nuisance parameters which are marginalised over in our formalism. 

This will change the sampling statistics of low order modes predicted by our theory in a self-consistent way. 
Cosmological parameters can then be estimated in an unbiased fashion without monopole or dipole removal. Whether 
this is feasible in practice would be an interesting avenue for further study. We arc not aware of any existing fast 
method which offers a similarly consistent treatment. 

In practice we observe that for large sky coverage the entries of the coupling matrices diminish rapidly with increasing 
I. To a good approximation, modes with I > 100 are therefore unaffected by the presence or absence of low order 
modes, I < 2. One of the main applications of our formalism has been to the estimation of parameters which depend 
mostly on the shape of the C spectrum around and beyond the first acoustic peak. These analyses are unaffected by 
the details of how low order modes of non-cosmological origin are dealt with. 

APPENDIX B: FACTORISING THE CORRELATION MATRIX 

For a general azimuthally symmetric window W(6) we can write the C as 

C = J2 I <KW(0i) f dQ 2 W(9 2 )Y lm ( ll )Y l * m ( l2 ) 

m 

Y ai'm'a;*'m" y im(7i)^"m"(72) 

I'm' l"m" 



1 We should mention that there is of course a natural occurrence of spurious monopole and dipole components even in a 
completely uncontaminated map as soon as a part of the sky is cut away. This is easy to understand: the monopole is only 
constrained to be zero if the anisotropy is integrated over the full sky. Similarly, a cut sky has a spurious dipole, even if the 
dipole was zero on the full sky. This statistical effect is fully taken into account in the formalism as it is presented here. All 
our Monte Carlo simulations of cut skies have monopole and dipole components even though they vanish exactly on the full 
sky. The sampling distributions of Co and Ci containing this effect are quoted in Eq. (20). 
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We define normalised associated Legendre polynomials \i m (%) in Appendix C and use the notation fi = cos 9. Then 
Yi m (9, <p) — A; m (/x)e 4m< ^ and we notice that the Kronecker-5 from the azimuthal integration select only one term out 
of the sums over m' and m", namely 

(OO \ / oo \ 

T, a t'mM'l) (T, a l"mM?l) (Bl) 
i'=o / \;"=o / 

where the 

Wjll = W Vm Irn = J d^W(p)X lm (p)X Vm (p) (B2) 

are real matrices. Because the W matrices are real, the terms in the brackets are complex conjugates of each other 
and we obtain Eq. (9) as desired. 



APPENDIX C: ALGORITHM FOR COMPUTING THE COUPLING MATRIX 



Subsection C 1 reviews some useful properties of spherical harmonics. In subsection C 2 we derive an efficient 
algorithm for computing the coupling matrix Eq. (B2) due to an azimuthally symmetric mask, Eq. (C12). More 
general windows can be piecewise defined in terms of masks of varying height. The corresponding coupling matrices 
are then suitably weighted sums over the coupling matrices due to the individual masks. 



1. Some properties of spherical harmonics 



The Spherical Harmonics are 

lWM)=Ai m (cos0)e im * (CI) 

where the 



Xlm(x) = J 2l A +1{ * , ^ 4(4 form>0 (C2) 
V 4-7T (I + my. 

hm = (-l) m A ;M , form>0, 
\ lm = 0, for |m| > I. (C3) 

In the following we will only consider positive m. The associated Legendre Polynomials P/ m arc defined as 

im 

P lm (x) = (-l) m (l - x 2 ) m/2 —Pi (x), (C4) 
where Pi is the Legendre function of the first kind defined as 

^H^* 2 -^ (C5) 

They are solution of the differential equation 

fp d / m 2 \ 

(1 _ x2) d^ Plm ~ 2x Tx Plm + + T^J Plm = (C6) 

Definition Eq. (C4) leads to a large number of relations between polynomials of different order I and m and their 
derivatives. Among them the following two will be useful (see, e.g. [22] §8.700) 

p lm = . ^(^i^ p _ ^—tjL p 

v 1 — x dx 

and 

(1 - x2 )^ P lm = -IxPim + + m)P(,_i) m (C8) 



1G 



It is possible to obtain the Pi m or alternatively the A; m through relations that raise from the fact that they are 
orthogonal polynomials. The following recurrence is stable and very convenient for numerical uses : 
starting with A o = 1/ V^r 



Amm 
A( TO +1) TO 

Aim 



2m 



2m 

x \/2m + 3 A 



- y/l - x 2 A 



(ro-l)(m-l) 



:i;A 



nun 
Xn 



(l-l)m 



(l-2)m 



A(l - l,m) 



A(Z,m) with A(/,m) = 



(C9) 



- 1 



m^ 



2. Scalar product on the cut sphere 

The spherical harmonics have the property that they are an orthonormal basis spanning the Hilbert space of square 
intcgrable functions on the sphere S 2 , i.e. their scalar product is the identity matrix: 



p2lX pi 

l «L 



(CIO) 

Let us now define the cut sphere scalar product of spherical harmonics under a mask or window W 

p2lX pi 

A?r'[w} = J d(t>J dcos0W(M)iW(M)iWM) (en) 

In the following we will consider only a sharp 'strip' filter whose boundaries are parallel to the equator (but not 
necessarily symmetric about the equator). 

W(cos6) = W(x) = 1 ifa<a;<6 

= otherwise (C12) 

This is simply related to physically relevant windows. For example a straight cut removing foregrounds concentrated 
around the Galactic disk is modeled by W = 1 — W with a = —b setting the width of the cut. Because A™™ [W] = 
1 - A]?™ [W], where 1 = 8 vl 8m'rm everything that follows can be translated to this window with the replacements 

b < a, and Az\ [W] = 1. (C13) 

Because the window Eq. (C12) is independent of <p it preserves the azimuthal symmetry and only spherical harmonics 
with the same m are coupled. The non-trivial terms of the coupling matrix A are then 

Afi = 2n [ dx\ Vm {x)\ lm {x) (C14) 

J a 

v /(2^ + l)(2^ + l) \ {l>-7n)\{l-my 

2 V {V + m )\{l + m)\ /; ' i(a ' h) (C15) 

where 

I]Jl(a,b) = [ dxP Vm {x)P lm {x) (C16) 

J a 

This expression is symmetric under interchange of V and I and under m — > —m. If < I < l max this implies that, 

asymptotically for large l m ax there are m ° x independent components of 7JJ](a, 6). If the window W is symmetric 
around the equator and therefore preserves reflection symmetry, only multipoles with the same parity of I couple to 
one another, and the number of non-trivial elements reduces to ~ l^ ax /12. 

The problem of evaluating all independent components of Eq. (C16) is to find a way of computing the integrals Eq. 
(C16) without having to take recourse to time-consuming numerical quadratures. 
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If one substitutes Eq. (C7) for Pi m and Pi> m m Eq. (C16), integrates by parts to exhibit the second derivative of 
P/( m _i), and makes use of the differential equation Eq. (C6) one obtains 

Ifi(a, b) = -(m + l)(m-l- l)I^-\a, b) 
+ (m - I - 1) [a;P ( /( TO _i)P;( m _i)] o 

+ (m + l- 1) [^(^-1)^(1-1)^-1)]* (C17) 

where we used Eq. (C8) to simplify the last two terms. 

This can be simplified further in the case I ^ I', by noting that the left hand side of Eq. (CI 7) is by definition 
symmetric in I' J whereas the right hand side is not explicitly symmetric. So, by subtracting Eq. (C17) from itself 
after swapping I and I' one obtains 

I^(a,b) - ((/' - [xP Vm P lm ] b a + {l + m) [PvmP { i-i )m ] h a - (I' + m) [P (l , -i )m Pi m ] b a ) 

I (V 2 + V - l 2 - Cj for I' ^ I. (C18) 

The diagonal terms {If 1 {a, b) = Pi m Pi m ) can be obtained thanks to the recurrence relation 

(Z - m)P lm = x{2l - l)P (J _ 1)m -(l + m- l)P (J _ 2 ) m . (C19) 
One obtains a recurrence on the diagonal terms involving also the off-diagonal terms computed by Eq. (C18). 

W)=f|^) fy^Vr-i (a, 6) 



21 + 1 J \l-m 

h\) (^) W.,C») - 4^V*<-« <C20) 

To conclude, the elements of the coupling matrix A are given in terms of recurrence relations as follows. For to 7^ to' 
we have 



while for I ^ Z' 



Afi = ( (/' - [*A,, m A, m ]> + y I^G 2 - ™ 2 ) [Ar m A (; _ 1)m ]^ 



2^T (/ - m } i^'-^lm] a j X (z/ _ 0(z/ + z + 1)i 



(C21) 



and all other cases are given by 



A m Am /2Z-ia + l) 2 -m2 21 + 1 G-l) 2 -m2 

= + V2TT3 Z 2 -m 2 - V2T^3 f2_ ro2 4(,- 3 ) d > "0 

4"» — /l" 1 " 1 4- ^ u\ \ i fc 



(C22) 



with A_J [W] = 0. As only a few operations are needed to evaluate each of these terms, calculating the whole coupling 
matrix costs 0(7^ lax ) operations and only requires about 10 minutes for Z max = 2048 on a fast work-station. 

APPENDIX D: THE METHOD OF CHARACTERISTIC FUNCTIONS 

1. Densities of functions of random variates 

How do we approach computing the probability density function (pdf) of a (set of) continuous random variate(s) 
which can itself be written as a function of one or more random variates whose pdfs are known? The usual expression 
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for changing the measure is 



P(f(x)) =p(x) 



df 



dx 



(Dl) 



where 



is the Jacobian of the transformation. However this expression is inconvenient if the transform is not 

monotonic or the number of transformed variates is not the same as the number of original variates. 

In order to attack this more general situation we replace Eq. (Dl) by a more powerful integral expression. The 
idea here is to define an "atomic" pdf which allows for only one event. For a continuous variate this is the delta 
function. The sum of these "atoms" over all possible outcomes, weighted with the probability that the underlying 
variates produce this outcome gives the resulting pdf. In symbols, if the n underlying variates X = {xi,i = 1, . . . , n} 
have the joint pdf p(X), then the pdf of z = f(X) is 



p{z) 



-I 



S(z-f(X))p(X)dX, 



(D2) 



where dX denotes n!=i d%i 



2. The Method of characteristic functions 



In particular we are often interested in the distribution of a linear combination of n independent quantities whose 
distributions are known. Computing these distributions is now straightforward - we can apply Eq. (D2) with the 
simplification that f(X) is linear and p(X) factorises owing to the independence of the variates. The trick is to 
perform the resulting convolution in Fourier space by application of the Fourier convolution theorem. This is far 
easier to do than computing n convolutions, at least if n > 2. The Fourier transform of a pdf of a variate is called 
its characteristic function — hence the name of the method. All major pdfs have tabulated Fourier transforms. This 
makes this method very simple to apply, up to the final step of inverse transforming to obtain the required pdf. This 
may be tabulated too. If not, extending the integrand to the complex plane and using contour integration perhaps 
after expanding into partial fractions may lead to success. In any case, at least only one integral needs to be evaluated 
compared to n for the direct convolution. 

Even if the inverse transform does not succeed, all is not lost. A key fact about characteristic functions allows us to 
gain as much information as we like about the statistical properties of z. By formally computing the Maclaurin series 
of the Fourier transform which defines the characteristic function c(k z ) of a variate z, we find that it is the generating 
function for its statistical moments, 

C (fc.) i!5iL. (D3) 

n=0 

Therefore, the moments can be obtained by straightforward differentiation of c(k z ). Equivalcntly, and sometimes 
more conveniently, the natural logarithm of the characteristic function generates the cumulants of the distribution. 
Moments and cumulants are algebraically simply related. Once these moments or cumulants are known there is a 
host of methods for using them to approximate p{z) [21]. In particular, when n is large, the large number theorem 
states that p(z) approaches a Gaussian with mean (z) and variance (z 2 ) (under weak conditions on the Pi{x{)). 
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